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0\ . Abstract 

(N 

Seminal works ll8l [T4l[T5]| generated a massive interest in studying linear under-determined systems with 

sparse solutions. In this paper we give a short mathematical overview of what was accomplished in last 
10 years in a particular direction of such a studying. We then discuss what we consider were the main 

Y^ I challenges in last 10 years and give our own view as to what are the main challenges that lie ahead. Through 

the presentation we arrive to a point where the following natural rhetoric question arises: is it a time to 
redirect the main challenges? While we can not provide the answer to such a question we hope that our 

^ i small discussion will stimulate further considerations in this direction. 

o, 

O . Index Terms: Linear systems of equations; sparse solutions; £i -optimization. 
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O ■ 1 Introduction 

en _ 

In this paper we will be interested in studying under-determined systems of linear equations with sparse 
solutions. We start by looking at the mathematical formulation of such a problem that attracted enormous 
^ I attention in recent years. The problem is essentially the following. Let x be an n-dimensional vector from 

i?". Moreover let x be fc-sparse (under /c-sparse we assume vectors that have at most k components that are 
not equal to zero; clearly k < n). Let A be an ?7i x n matrix from K^^"-. We will call A the system or the 
measurement matrix and throughout the rest of the paper assume that ^ is a full rank matrix (on occasions 
when A happens to be random we will assume that A is of full rank with overwhelming probability, where 
under overwhelming probability we consider a probability that is not more than a number exponentially 
decaying in n away from 1). Now, the question of interest is: given A and Ait can one find x such that 

Ax = A±. (1) 

Fairly often one refers to Ax. in ([T]) as a known vector y from K^. In other words, if one rewrite the system 
given in [U in a more natural way 

Ax = y. (2) 

y is essentially implied to be constructed as the product of matrix A and a A;-sparse vector x. We will often 
in the rest of the paper use the expression "solve the system give in ^". By that we will mean that if x is 
the solution we found by solving (jj)) using any available methodology then x = x. 

The problem stated in ([l} (or in (|2]l is very simple. In fact as we mentioned above it is nothing but a 



system of linear equations which are additionally assumed to have sparse solutions. As is usually the case 
with linear systems, a critical piece of information that enables one to solve the problem is the relation 
between k, m, and n. Clearly, if m > n the system is either over-determined or just determined and solving 
^ could a bit easier. On the other hand if m < n the system is under-determined and in general may not 
have a unique solution. However, if A; < m one may still be able to figure out what x is. Clearly, if one 
knows a priori the value of k one can then search over all subsystems obtained by extracting k columns from 
matrix A. Of course, such an approach would probably solve the problem but it is very complex when n 
and k are large. To be a bit more specific (as well as to make all our points in the paper clearer) we will 
in the rest of the paper assume the so-called linear regime, i.e. the regime where all k, m, and n are large 
but proportional to each other. To be even more specific we will assume that constant of proportionality 
are /? and a, i.e. we will assume that /? = - and a = —. Under such an assumption the complexity of 
the above mentioned strategy of extracting all subsystems with k columns would be exponential. Instead 
of such a simple strategy one can employ a host of more sophisticated approaches. Since this paper has its 
own objective we will not present all known approaches. Instead will try to shorten our presentation in that 
regard and focus only on what we consider are the most popular/well known/successful ones. Moreover, 
since studying any of such approaches is nowadays pretty much a theory on its own we will most often just 
give the core information and leave more sophisticated discussion for overview type of the papers. 

We start by emphasizing that one can generally distinguish two classes of possible algorithms that can be 
developed for solving ([T]). The first class of algorithms assumes freedom in designing matrix A. Such a class 
is already a bit different from what can be employed for solving ([T]) or (jj)). Namely, as we mentioned right 
before ([T]) and (HJl, our original setup assumes that we are given a matrix A. Still to maintain a completeness 
of the exposition we briefly mention this line of work and do so especially since what can be achieved within 
such an approach seems to be substantially better than what can be achieved within our setup. 

So, if one has the freedom to design matrix A then the results from [3]|32]|36] demonstrated that the 
techniques from coding theory (based on the coding/decoding of Reed-Solomon codes) can be employed to 
determine any A;-sparse x in ([T]) for any < a < 1 and any /3 < ^ in polynomial time (it is relatively easy 
to show that under the unique recoverability assumption /? can not be greater than ^). Therefore, as long as 
one is concerned with the unique recovery of fc-sparse x in ([T]) in polynomial time the results from [3, 32,361 
are optimal. The complexity of algorithms from ||3]|32]|36] is roughly 0{n^). In a similar fashion one can, 
instead of using coding/decoding techniques associated with Reed/Solomon codes, design the matrix and 
the corresponding recovery algorithm based on the techniques related to the coding/decoding of Expander 
codes (see e.g. 129, 30, 501 and references therein). In that case recovering x in ([T]) is significantly faster 
for large dimensions n. Namely, the complexity of the techniques from e.g. Il29l[30ll50ll (or their slight 
modifications) is usually 0{n) which is clearly for large n significantly smaller than 0{n^). However, the 
techniques based on coding/decoding of Expander codes usually do not allow for /? to be as large as ^. 

The main interest of this paper however will be the algorithms from the second class. Within the second 
class are the algorithms that should be designed without having the choice of A (instead, as mentioned 
right before ([T|) and © matrix A is rather given to us). Designing the algorithms from the second class 
is substantially harder compared to the design of the algorithms from the first class. The main reason for 
hardness is that when there is no choice in A the recovery problem ([!) becomes NP-hard. The following 
three algorithms (and their different variations) we currently view as solid heuristics for solving ([l}: 

1 . Orthogonal matching pursuit - OMP 

2. Basis pursuit - ii-optimization. 

3. Approximate message passing - AMP 

We do however mention that in the third technique, which is based on belief propagation type of algorithms, 
is emerging as a strong alternative in recent years. While it does not have as strong historical a background 



as the other two (at least when it comes to solving ([T])) its great performance features as well as its a relatively 
easy implementation make it particularly attractive. Under certain probabilistic assumptions on the elements 
of A it can be shown (see e.g. |35]|46l|47|) that ii m = 0{k log(n)) OMP (or slightly modified OMP) can 
recover x in ([D with complexity of recovery 0{n^). On the other hand a stage-wise OMP from [22] 
recovers x in ([T|) with complexity of recovery 0{n log n). Somewhere in between OMP and BP are recent 
improvements CoSAMP (see e.g. fST]) and Subspace pursuit (see e.g. 1 12]), which guarantee (assuming the 
linear regime) that the fc-sparse x in ([T]) can be recovered in polynomial time with m = 0{k) equations. 

2 fi -optimization 

We will now further narrow down our interest to only the performance of £i -optimization. (Variations 
of the standard ^i -optimization from e.g. [l()'.'Tr.'40l) as well as those from |T3l l23ti26ll39l related to iq- 
optimization, < g < 1 are possible as well.) Basic ^i -optimization algorithm offers an x in ([T]) as the 
solution of the following £i-norm minimization problem 

min ||x||i 
subject to ylx = y. (3) 

Due to its popularity the literature on the use of the above algorithm is rapidly growing. We below restrict 
our attention to two, in our mind, the most influential works that relate to ^. 

The first one is [8 1 where the authors were able to show that if a and n are given, A is given and 
satisfies the restricted isometry property (RIP) (more on this property the interested reader can find in e.g. 
UmilTlllllSll), then any unknown vector x with no more than k = /3n (where /? is a constant dependent 
on a and explicitly calculated in [8]) non-zero elements can be recovered by solving ©. As expected, this 
assumes that y was in fact generated by that x and given to us. The case when the available y's are noisy 
versions of real y's is also of interest Il8l l9ll28[|49l . Although that case is not of primary interest in the present 
paper it is worth mentioning that the recent popularity of £i -optimization in the field of compressed sensing 
(where problem ([T} is one of key importance) is significantly due to its robustness with respect to noisy y's. 
(Of course, the main reason for its popularity is its ability to solve ([B for a very wide range of matrices A; 
more on this universality from a statistical point of view the interested reader can find in ll2n .) 

However, the RIP is only a sufficient condition for £i -optimization to produce the fc-sparse solution of 
([T]). Instead of characterizing A through the RIP condition, several alternative route have been introduced 
in recent years. Among the most successful ones are those from e.g. |[T4l[T5l[T7l[T8ll42ll44l and we will 
revisit them below. However, before revisiting these approaches we should mention that it was fairly early 
observed that if matrix A and vector x are deterministic (and hence can always be chosen so to make the 
solution of (O be as far away as possible from x) then it is highly unlikely that ^ would be of much help 
in providing a provably fast (say, polynomial) way for "guaranteed" solving of ([Hi. Having this in mind the 
shift to statistical A and/or x happened fairly quickly. The idea for such a shift can be summarized in the 
following way: if it is not possible to recover x in ([Til by solving ([3]) for all A and x then maybe it can still 
be possible for an overwhelming majority of them. A way to characterize such an overwhelming majority is 
then to introduce randomness on A and/or x. For example, if A is a random matrix one may be able to say 
that if a concrete A (i.e., its elements) in ([T]) is drawn from a probabilistic distribution then maybe for such 
an A the solution of ([3]) is often x. This is somewhat standard way of attacking NP-hardness (there are of 
course more sophisticated ways, but in this paper we will look just at this basic premise). 



2.1 Geometric approach to £i -optimization 

In |[T4l[T5]| Donoho revisited the £i -optimization technique from © and looked at its geometric proper- 
ties/potential. Namely, in lfT4l[T5l Donoho considered the polytope obtained by projecting the regular n- 
dimensional cross-polytope Cp by A. He then established that the solution of ^ will be the A;-sparse 
solution of ([T|l (i.e., it will be x) if and only if ACIj; is centrally ^-neighborly (for the definitions of neighbor- 
liness, details of Donoho's approach, and related results the interested reader can consult now already classic 
references H14lll51[T7l[T8l ). In a nutshell, relying on a long line of geometric results from Il2]|6l l33ll37ll48l . 
in [1151 Donoho showed that if A is a random m x n ortho-projector matrix then with overwhelming proba- 
bility ACp is centrally ^-neighborly (as mentioned earlier, under overwhelming probability we in this paper 
assume a probability that is no more than a number exponentially decaying in n away from 1). Miracu- 
lously, II141I15I provided a precise characterization of m and k (in a large dimensional context) for which 
this happens. 

Before, presenting the details of Donoho's findings we should make a few clarifications. Namely, it 
should be noted that one usually considers success of (O in recovering any given A;-sparse x in ([T]l. It is also 
of interest to consider success of (|3]l in recovering almost any given x in ([T]). We below make a distinction 
between these cases and recall on some of the definitions from lfT5l[T7l[T9ll20ll43ll44l . 

Clearly, for any given constant a < 1 there is a maximum allowable value of /3 such that for any given 
/c-sparse x in ([Hi the solution of ^ is with overwhelming probability exactly that given /c-sparse x. We 
will refer to this maximum allowable value of /3 as the strong threshold (see ITSl '). Similarly, for any given 
constant a < 1 and any given x with a given fixed location of non-zero components and a given fixed 
combination of its elements signs there will be a maximum allowable value of /3 such that ^ finds that 
given x in ([T]) with overwhelming probability. We will refer to this maximum allowable value of /3 as the 
weak threshold and will denote it by (3^ (see, e.g. II43II44II '). What we present below are essentially Donoho's 
findings that relate to (3w 

Knowing all of this, we can then state what was established in ifTSl . If A is a random ortho-projector 
ACp will be centrally fc-neighborly with overwhelming probability if 



n 



-1 



logiCcomCintiT'', T"')CeAF^, C^)) < (4) 



where C^om = 2— '=("~^-i), Cint{T\T^) is the internal angle at face T^ of T™, C7,,i(F™, C7«) is the 
external angle of Cp at any m-dimensional face F"*, and T^ and T™ are the standard k and m dimensional 
simplices, respectively (more on the definitions and meaning of the internal and external angles can be found 
in e.g. 1 27 1). Donoho then proceeded by establishing that (01) is equivalent to the following inequality related 
to the sum/difference of the exponents of Ccom, Cint, and Cext- 



^net = ^com - ^int - ^ ext < (5) 



where 



1-/3 

*ext(/3,a) = n-hog{Cext{F"',C;)) (6) 

and H{p) = —plog{p) — (1 — p) log(l — p) is the standard entropy function and log ( " ) = e"'^^^' is the 
standard approximation of the binomial factor by the entropy function in the limit of n — ;■ oo. Moreover, 



/9 



Donoho also provided a way to characterize ^j„t(/3, q), ^ext(/3, a))- Let 7 = ^ and for s > 

1 f°° _^ 
<I>(s) = —;= / e 2 dx 
V2^Js 



1 .2 

-^. (7) 



27r 



Then one has 



where 



^mi(/3, a) = (a - m^iy^) + (« - /5) log(2) (8) 



- ^ 



2 ' 7 Z TT 7 



^7(2/7) = -0^7— ^-olog(z) + log(^) (9) 



and s^ > is the solution of 



$(s) = (i-7)^i£Z. (10) 

s 



On the other hand the expression for ^eat(/3, o) is a bit simpler 

^extW,a) = mm{ay^ - (1 - a) log(erf(y))). (11) 

y>o 

Using ©, ©, dUl, I©, Cnil, dn) one then for a fixed a finds the largest /3 so that the left-hand side of © 
is basically zero. Such a /3 is what we termed above as /3^. While the above characterization of optimal /3^ 
(as a function of a) is not super simple it is truly fascinating that it actually ends up being exact. 
We summarize the above results in the following theorem. 

Theorem 1. (Exact li [fiyj , a^ ) threshold — geometric approach of hMfTS^ ) Let Ain^be anmxn ortho- 
projector (or anmxn matrix with the null-space uniformly distributed in the Grassmanian). Let k,m,nbe 
large and let aw = — and (3 = - be constants independent of m and n. Le?^com(/3, ««;), '3/j„j(/3,a^), ^exi(/3,a^ 
be evaluated for a pair (/3, a^) through the expressions given in ^, dS]), and (IZ2J- Let then (3^ be the maxi- 
mal (3 for which (O holds. Then: 

1) With overwhelming probability polytope AC!} will be centrally /3n-neighborly for any /3 < /3^. 

2) With overwhelming probability polytope ACp will not be centrally fin-neighborly for any (3 > fiyj. 
Moreover, let x in ([7]) be k-sparse. Then: 

1) If (3 < (3yj then with overwhelming probability for almost any x, the solution of^ is exactly that x. 

2) If (3 > (3yj then with overwhelming probability for almost any x, the solution o/(|21) is not that x. 

Proof. Follows from considerations presented in II14I I T51 . D 

In the following subsection we present a related collection of results that were obtained in a series of our 
own work B2] - |44]| attacking performance analysis of (|3]l through a probabilistic approach. 

2.2 Purely probabilistic approach to £1 -optimization 

In our own work P4l we introduced a novel probabilistic framework for performance characterization of 
© (the framework seems rather powerful; in fact, we found hardly any sparse type of problem that the 
framework was not able to handle with almost impeccable precision). Using that framework we obtained 
lower bounds on /3^. These lower bounds were in an excellent numerical agreement with the values obtained 



for f3w in [15 j. We were therefore tempted to believe that our lower bounds from P4l are tight. In a 
follow up paper [42 1 we then presented a mechanism that can be used to obtain matching upper-bounds, 
therefore establishing formally results from [44 1 as an alternative ultimate performance characterization of 
Q. Alternatively, in [41]. we provided a rigorous analytical matching of /3^ threshold characterizations 
from 1441 and those given in |[T4l . The following theorem summarizes the results we obtained in PTH441 . 

Theorem 2. (Exact ii (/?«,, a^;) threshold — probabilistic approach of l{42\\44\l } Let A be an m x n matrix 
in ([7]) with i.i.d. standard normal components. Let x in ([7]) be k-sparse. Further, let the location and signs 
of nonzero elements ofk be arbitrarily chosen but fixed. Let k, m, n be large and let a = ^ and l^w = — be 
constants independent ofm and n. Let erfinv be the inverse of the standard error function associated with 
zero-mean unit variance Gaussian random variable. Further, let all e 's below be arbitrarily small constants. 

L Let 6w, ({^w < ^u; < Ij be the solution of 

[2 -{erfinv{^^^)f 
(1 - ef ))(1 - M^ -, V2erfinv{il + e['^)^^) = 0. (12) 



1-/3. 



If a and j3w further satisfy 



i-f^J ^,J^(^^-<W)' ^i-oA,. r"^^^ 



V2^ \ ^ierfinvi^)r 1-/3,' 

then with overwhelming probability the solution 0/0 is the k-sparse xfrom ([7]). 
2. Let 6w, (f3w < ^?i) < Ij be the solution of 



2 (erfinv(^)) 



(13) 



[2 -{erfinv i^^^))'^ 
(1 + e(^))(l - M^ V2erfinv{(l - e^'^)^^) = 0. (14) 



a < 



If on the other hand a and j3yj satisfy 

2(1 - /3.) 72(^5^^^^^ 



(1 + 4' 



(m)s 



/ I T~n ( t^ a \ [2 -{erfinv(\^)f\ \ 



\ 



\/2^ ^ierfinvi^)? e^(i + ^^3))-2 



(15) 



then with overwhelming probability there will be a k-sparse x (from a set ofx's with fixed locations 
and signs of nonzero components) that satisfies (EJ and is not the solution of^. 

Proof. The first part was established in ['441 and the second one was established in ll42l . An alternative way 
of establishing the same set of results was also presented in 1.41.1 . D 

We below provide a more informal interpretation of what was established by the above theorem. Assume 
the setup of the above theorem. Let aw and (3^ satisfy the following: 



Fundamental characterization of the ii performance: 



/T^-(erfinv(i^))2 
(1-/3^) ^^ a^ -^/2erfinv(^) = .| (16) 

Then: 

1) If a > a^] then with overwhelming probability the solution of (|3]l is the fc-sparse x from ([U. 

2) If a < Uu] then with overwhelming probability there will be a A;-sparse x (from a set of x's with fixed 
locations and signs of nonzero components) that satisfies ([T]) and is not the solution of ^. 

As mentioned above, in {4V\ we established that the characterizations given in Theorems [T] and |2] 
are analytically equivalent which essentially makes (IT6] ) the ultimate performance characterization of li- 
optimization when it comes to its use in finding the sparse solutions of random under-determined linear 
systems. 

3 Approximate message passing - AMP 

In this section we briefly revisit a novel approach for solving ([ij. The approach was introduced in [|16J . It is 
essentially an iterative algorithm: 

x(*+i) = r/i(A^zW+xW) 

zW = y-AxW+iz(*-i)Avg(7?:(^^z(*-i)+x(*-i))). (17) 

a 

rjt is a scalar function which operates component-wise on vectors and r][ is the first derivative of 7]t with 
respect to its scalar argument. Avg is a function that computes the average value of the components of 
its vector argument. The algorithm is iterative and a stopping criterion should be specified as well. There 
are many ways how this can be done; for example one can stop the algorithm when a norm of the dif- 
ference between two successive x's is what one deems small when compared to their own norms. The 
more important question is why this algorithm would have a good performance. In the absence of term 
-z''*~^)Avg(??j(j4-^z''*~^) + x(*~^))) the algorithm boils down to the class of iterative thresholding algo- 
rithms considered in e.g. |l3T]. These algorithms have a solid recovery abilities and are very fast. The 
algorithm (ITtT i is obviously also very easy to implement and has a substantially lower running complexity 
than BR Using a state evolution formalism in lfT6l a fairly precise performance characterization of (ITTJ when 
used for finding x in ([T]) was given. Namely, in 1161 the authors established that 

o^am,) _ ^(a^p) ^,, / 1 - 2/al-^) ((1 + Z^)cl>(z) - Z<^(z)) \ 

p^ -a^ ™1^ l + z2_2((l + z2)ci>(^)_^0(^)) J' ^^«^ 

with (3w"^^ and aw"^^ having meanings similar to those of /3^„ and aw) from the previous section. More- 
over, in [5J the state evolution formalism was proved to hold thereby establishing findings of [161 as rigorous. 
We summarize the above results in the following theorem. 

Theorem 3. (Exact AMP {Pw^^ , a^"^^ ) threshold — AMP approach ofKsUEllJ Let Abe anmxn matrix 
in ([7]) with i.i.d. standard normal components. Let x in ([7]) be k-sparse and given. Let k, m, n be large and 
let otuT^^ = — and P = — be constants independent ofm and n. Let ^{z) and (piz) be as defined in ([71). 

Let /3|u""^ be as defined in di8D . Then there is a suitable fimction rjt in f liZD (e.g. a properly tuned simple 
soft thresholding fiinction would suffice) such that: 

1) If fi < f3w^^ the solution of([17\l is the k-sparse x in (|7]) with overwhelming probability. 



2) If P > /3i5^"^^ the solution of M 71) is not the k-sparse x in ([7]) with overwhelming probability. 

Proof. The algorithm as well as the general finding were established in [,16,1 . The mathematical correctness 
was established in Q. D 

Moreover, in |[T6l it was established that the characterization given in (fTSl) actually analytically matches 
the characterization given in (IT6] l (and based on findings of 11411 automatically the one obtained by Donoho 
and given in Theorem [T]l. All in all, based on everything we mentioned above one is essentially left with a 
signle characterization that determines performance of both, the £i -optimization algorithm from ^ and the 
AMP algorithm from dTTT i. Below, in Figure [T] we present the characterization in (/?, a) plane. 
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Figure 1: Weak threshold, ^i -optimization (BP), AMP 



4 Revisiting challenges 

In the previous sections we revisited algorithmic and theoretical results that we view as the most successful 
currently available when it comes to recovering x in ([T]!. However, if one now looks at the timeline of 
all these results one can observe that since the original work of Donoho II141I15I appeared almost 10 years 
ago not much changed in the performance characterizations. Of course not much can be changed, Donoho 
actually determined the performance characterization of the ^i -optimization. What we really mean when 
we say not much has changed is that there has not been alternative characterizations that go above the one 
presented in Figure [H One can object even this statement. Namely, there are various special cases when 
the characterizations can be lifted (see, e.g. [45 1 or say various other papers that deal with reweighted ii 
type of algorithms |[T0l[m[40.n . Still, while there are scenarios where the characterizations can be lifted, 
we have not seen yet what we would consider a "universal" lift of the characterization given in Figure [T] 
When we say universal, we actually mean that the characterization should faithfully portray an algorithm's 
performance over a fairly uniform choice of x (or even over all x). For example, while all reweighted 
versions of li typically provide substantial improvement over ii they typically fail to do so when x has 



binary nonzero components. This of course raises a question as to what one can/should consider as a 
universal improvement over £i and a fairly uniform choice of x. In our view, to quantify uniformity of 
X for which we expect algorithms to work the characterization formulation given in Theorem |2] could be 
somewhat useful. Namely, borrowing parts of a setup of such a formulation one can pose the following 
problem: 

Question 1: Let A be an an x n matrix with i.i.d standard normal components. Let x be a /Sn -sparse n- 
dimensional vector from R"- and let the signs and locations of its non-zero components be arbitrarily chosen 
but fixed. Moreover, let pair (/3, a) reside in the area above the curve given in Figure [T] Can one then design 
a polynomial algorithm that would with overwhelming probability (taken over randomness of A) solve ([T|l 
for all such x? 

The idea is then that if the answer to the above question is yes then we "agree" that an improvement 
over ii has been made. Of course, it is not really clear if the above question is really the best possible 
to assess a potential improvement over £i. Essentially, in our view, it is an individual assessment what 
establishes an improvement and what does not. For us for example, it is actually even hard to explain what 
we would consider an improvement. Since this is a mathematical paper, the question posed above is an 
attempt to mathematically characterize it. However, practically speaking, it is rather something that can 
not be described precisely but would be obvious to recognize if presented upon. From that point of view, 
the above question is just a reflection of our success/failure in finding a way to fit our feeling into an exact 
mathematical description. We do believe that over time one can develop a better formulation but until then 
we will rely on the one given above and on a bit of a subjective individual feeling. Along the same lines 
then, everything that we will write below should in a way be prefaced by such a statement. 

4.1 Restrictions 

There are several comments that we believe are in place. They, in first place, refer to the restrictions we 
posed in the above question. 

1. In the posed question we insist that the components of A are i.i.d standard normal random variables. 
That may not necessarily be the right way to capture the universal capabilities of £i or for that matter 
the universal capabilities of any other algorithm. Still, it is our belief that such a statistical choice is 
the least harmful. In other words, if we assume that A has a different type of randomness one then 
may ask why such a randomness is any more universal than say Gaussian. While we indeed restricted 
randomness of A we believe that we did it in a fairly harmless way. 

2. Another restriction that we introduced is the restriction on x. This restriction may be a bit problematic 
if, for example, one works hard to select a particulary good/bad set of non-zero locations for a particu- 
larly good/bad matrix A. However, if A is comprised of i.i.d. standard normals then this choice seems 
harmless as well. Of course, if a different A is to be considered then restricting signs and locations 
can substantially bias x. Also, although it is not necessary, we suggest one uniformly randomly select 
locations and signs and then fix them (given the rotational invariance of rows of A this may sound as 
if unnecessary i.e. one can alternatively take any set of non-zero locations and any combination of 
signs). However, one eventually may want to upgrade Question 1 to include different matrices A and 
then random choice of locations and signs of x may be needed. 

3. Our choice of polynomial algorithms can also be problematic. For example, there are many algorithms 
that are provably polynomial but with running time that can hardly ever be executed practically. More 
importantly, by insisting that the algorithms are polynomial we are potentially excluding some of the 
random algorithms or those whose running time depends on the values of the input (which in our case 
are random!). This is probably one of the major issues with Question 1. It is possible that not much 



would change even if we allow, say, algorithms that are with overwhelming probability (taken over 
their own randomness or even over the randomness of the problem itself or even over both of them) 
polynomial. For example, if AMP was able to give a performance characterization higher than the 
one that BP gives, the answer to Question 1 would still not be yes. One would have to argue that 
AMP is a polynomial algorithm. That is exactly where the problems of polynomiality may appear. 
One could occasionally have problems arguing that typically super-fast random algorithms are in the 
worst-case polynomial. Moreover, one should as well be careful how the worst-case is interpreted, 
i.e. is it interpreted over problem instances or over its algorithm's own randomness. We do, however, 
believe that if the polynomiality is a stiff restriction one can relax it to polynomial with overwhelming 
probability, where, as mentioned above, randomness would be over both, the problem instances as 
well as potential random structure of the algorithm. 

4.2 Redirecting a challenge 

If one can come to terms with deficiencies of the question that we posed then it may not be a bad idea to 
revisit the timeline of the problem it addresses. As is well known, under-determined linear systems with 
sparse solutions have been around for a long time. Consequently, a host of ways to attack them is known (in 
fact, we briefly discussed some of them in Section[l]l. For a long time it had been a prevalent opinion that BP 
is a solid heuristic when it comes to increasing recoverable sparsity. Such a popular believe was analytically 
justified for the first time in seminal works ll8l [T4][T5l . Moreover, the results of llT4][T5l in a large dimensional 
and statistical context provided the exact performance characterization of BP. Initial success of ll8l [T4llT5]| 
then generated enormous interest in sparse problems in many different fields. The set of achieved results 
does not seem exhaustable and as if growing on a daily basis. Impressive results have been achieved across 
a variety of disciplines and range from various algorithmic implementations to specific applications and 
needed adaptations. 

Our own interest is on a purely mathematical level. From a purely mathematical point of view. Question 
1 (with its all above mentioned deficiencies) in our mind stands as a key test on the path of almost any 
improvement in recoverable sparsity characterizations. Providing answer yes to Question 1 seems to us 
as basically a guarantee that a mathematical improvement is possible. Now, looking back at what was 
done related at its core to Question 1 in last 10 years two lines of work that we mentioned in the previous 
sections are of particular interest. One is the line that follows the design and analysis of AMP and the other 
one is our own revisit of BP. However, not much progress seems to have been made as far as providing 
answer yes to Question 1 in any of these lines (and for that matter in any other line of work known to 
us). Namely, while both results, |I51[T6l and II421I44II are incredible feat on their own, not only are they not 
moving the characterization obtained by Donoho in II14I I15I. they aie actually reestablishing it in a different 
way. Reestablishing Donoho 's results is of course a fine mathematical achievement. However, when viewed 
through the prism of establishing answer yes to Question 1 reestablishing Donoho's results is a somewhat 
pessimistic progress. 

More specifically, our own results, for example, in a way hint that the best one can do through a convex 
type of relaxation is probably what BP does. On the other hand situation may be even worse if one looks 
at AMP and results obtained in JSKHl. It is almost unbelievable that a different algorithm (in this case the 
AMP) achieves exactly the same performance as BP. Since it does happen one naturally wonders how is it 
possible. One simple way would be that AMP essentially just solves BP, though in a very clever and efficient 
way (if this would turn out to be indeed true then, as far as moving up the curve in Figure [T] is concerned, 
things may not be overly pessimistic). On the other hand, if AMP is indeed a fundamentally different 
approach then one may start thinking weather or not lifting the curve in Figure [T]is actually possible within 
the frame of Question 1. And since there is currently really no evidence either way one simply wonders if it 
is already a time to start looking at Question 1 with the idea of providing answer no. 
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Since we have not looked at Question 1 from that perspective we can not really comment much as to 
what are the chances that the answer is indeed no. On the other hand it has been almost 10 years since 
Donoho created his results, almost 5 years since we created our own, and probably as long since the results 
of |[T6il were created. Given the massive interest that this field has seen in last 10 years one would expect 
that if the answer to Question 1 is yes then it would have been already established. Of course, one can then 
alternatively argue the other way around as well. Namely, if the answer to Question 1 is no, wasn't there 
enough time during the last decade to establish it. We of course do not know if there was enough time for 
establishing any definite answer to Question 1. However, it is our belief that the majority of mathematical 
work was concentrated at establishing results that would imply answer yes to Question 1 . If our belief is 
even remotely close to the truth then one can realistically ask if it is really a time to redirect the challenge 
and try to look at ways that would lead to providing answer no to Question 1. 

As we have stated above, we do not know what the answer to Question 1 is. However, given that we 
expressed our belief that it is not impossible that the answer is actually no, it would be in fact reasonable 
that we provide at least some information as to which way we are leaning. Well, our position is somewhat 
funny but certainly worth sharing: we work believing that the answer is yes but if we were to bet we would 
bet that the answer is no. Of course this position is massively hedged but in our view seems reasonable. 
Namely, if it turns out that the answer is yes we would need to pay but would in return get to see the show 
which seems as a pretty nice option (if there is to be a show we firmly believe that it must be a big one!). On 
the other hand if there is no show we would get overreimbursed for the ticket we actually never had which 
is not that bad either. As far as our preference goes though, we would still prefer to see the show! 

5 Further considerations 

5.1 What after Question 1 

In the previous section we discussed a possible shift in the approach to answering Question 1. A very 
important point to make is that even if one is able to answer Question 1 the whole story is not over. In this 
subsection we present what in our view would be further points of interest once Question 1 is settled. 

If it turns out that the answer is no, then in a way the value of a majority of the work done in the previous 
decade would be even higher. As we have mentioned above a majority of the work done in last decade was 
related to polynomial algorithms (or those that are highly likely to be polynomial) and part of the (/3, a) 
plane below the curve given in (IT6l ) and Figure [H In that sense the contribution of line of work initiated 
in milll would be of an incredible value. 

On the other hand if it turns out that the answer to Question 1 is yes then naturally a variety of further 
questions will appear. The first next in our mind would be: 

Question 2: Assuming that the answer to Question 1 is yes, can one then determine an alternative curve 
say (/3(°P*), a*^°P*)) for which the answer to Question 1 is no? Along the same lines can it happen that there 
is no such a curve that is below a straight line at 1? 

Then one can go further and assuming that the answer to the first part of Question 2 is yes but the answer 
to the second part of Question 2 is no, ask the following: 

Question 3: Assuming that the answer to the first part of Question 2 is yes, can one then lower curve 
(^(°P*), a(°P*)) until the answer to Question 1 is no? 

Settling all these questions would in our mind be a way to deepen our understanding of a polynomial 
solvability of under-determined linear systems with sparse solutions. 
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5.2 What after Questions 2 and 3 

An important scenario that may play out when settling the above questions is that the ultimate curve 
(^(opt) ,^(opt)j (under the premises of Question 1) is not the straight line at 1 (for example, answer no 
to Question 1 immediately forces such a scenario). Such a scenario would be a great opportunity to revive 
studying random hardness within the current complexity theory framework. In our mind such a view of 
hardness portion of the traditional complexity theory is an important aspect both, practically and theoret- 
ically. Unfortunately, it seems a bit premature to start looking at it right now for a variety reasons. First, 
even in a general complexity theory there are fewer results that relate to random hardness then to typical 
notion of worst-case hardness/completeness. Second, we are not even sure that the current setup of random 
hardness/completeness has been well established/investigated even on way more popular optimization or 
decision problems. 

Still, it is important to note that if one starts attacking Question 1 with an ambition to show that the 
answer is no, then the above mentioned random hardness concepts should probably be revisited and their 
meaning reunderstood and quite possibly even adapted to better fit the scope of the story presented here. 
The idea of this paper is just to hint that there may be a time to think about other directions when it comes 
to studying linear systems. We then consequently refrain from a further detailed discussion about this here, 
but mention that all these problems seem to be at a cutting edge of what we envision as a future prospect for 
studying under-determined systems with sparse solutions. 

6 Conclusion 

In this paper we revisited under-determined systems of linear equations with sparse solutions. We looked at 
a particular type of mathematical problems that arise when studying such systems. Namely, we looked at 
the characterizations of relations between the size of the system and the sparsity of the solutions so that the 
systems are solvable in polynomial time. 

We started by giving a brief overview of the results that we considered as mathematically most important 
for a direction of study that we wanted to popularize. We then made several observations related to the pace 
of progress made in last 10 years. When it comes to studying polynomial algorithms and their abilities 
to solve a class of random under-determined linear systems, our main observation is that there has been a 
somewhat limited progress as to what the ultimate performance characterization of such algorithms is. We 
then raised a question which in a way asks whether is it possible that the performance characterizations of 
two known algorithms (namely, BP and AMP) could in fact be the optimal ones when it comes to polynomial 
algorithms. We believe that this will stimulate a further discussion in this direction in a host of mathematical 
fields. 
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